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[57] ABSTRACT 

Character recognition with an improved recognition ratio is 
provided without requiring special operations to be per- 
formed before character recognition is performed. A pre- 
liminary character recognition operation is performed in 
which the feature vector of an input character is compared 
to a recognition dictionary that contains a reference vector 
(for each.catego^y. The candidate category to which the input 
character belongs is determined, and the feature vector and 
recognition result for each input character is saved. The 
input characters judged to have been recognized with high 
reliability are selected with reference to their recognition 
results. The feature vector of each selected input character is 
used to predict a/Writer-specific feature vector of a category 
different from the candidate category to which the selected 
input character belongs, A writer-specific reference vector is 
then generated for each category from the writer-specific 
feature vector of the category, preferably by using the 
writer-specific feature vector to correct the reference vector 
for the category. A final character recognition operation is 
then performed in which the feature vectors of the input 
characters are compared with the writer-specific reference 
vectors. 

18 Claims, 3 Drawing Sheets 
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CHARACTER RECOGNITION METHOD The reference vectors are stored in a recognition dictio- 

AND APPARATUS USING WRITER-SPECIFIC nary. The recognition dictionary is statistically created from 

REFERENCE VECTORS GENERATED character patterns obtained from the handwriting of many 

DURING CHARACTER-RECOGNITION people. Before the character recognition system can be used 

PROCESSING 5 for handwriting recognition, the recognition dictionary is 

created by a number of unspecified writers each handwriting 
FIELD OF THE INVENTION a predetermined set of characters. The category to which 
This invention relates to the field of character recognition. cach of me characters in the set belongs is known. The 
More particularly, the invention relates to a method that fcaturc vectors extracted from the characters in each cat- 
improves the recognition ratio of character recognition by 10 c g orv arc averaged and each average vector is stored in the 
taking account of font differences and the personal writing recognition dictionary as the reference vector for the cat- 
style of the writer. c e°ry. 

Because the recognition dictionary just described is cre- 

B ACKGROUND OF THE INVENTION ated from the handwriting of unspecified writers, this type of 

Character recognition is typically implemented in the 15 weogption dictionary can be regarded as a universal rec- 

three stages, preprocessing, feature extraction, and discrimi- o ^ lil0n . that °* * ™ ed to P^rm character 

nation. In the preprocessing stage, size normalization of the «wgnition on the writing of any writer. However, because 

input character pattern and noise removal are normally of the stylistic differences between waters, the recognition 

rformed ratio of a character recognition system employing a umver- 

~ . !. f i4 . , i.-ic* ,20 sal recognition dictionary will depend greatly on how 

During the feature extraction stage multiple feature val- ^ each writerJs fc ^ e re esented 

ues that represent the features of each input character are b ^ refercncc ycctors stored m ^ ^ 

extracted from the input character pattern and a feature dictionary 

vector representing the feature values is generated. Each , t ^ ^ iQ ^ ior ^ tQ - ye ^ ition ratio 

feature of the input character represent a portion of the ^ of fl charactef ^ on lem b ^ each of the 

structure of the input character Typical features include the whose hanQWritin ^ to be reC ognized by the system 

ength of stroke the ang e of stroke, and the number of tQ Qand wri(e a ^ of detennined characters to M a 

loops. For example, when the feature is he number of loops, ^ oition However, the requirement 

the feature value may have one the following values: ^ eacfa wri(ef faand Wfite a ^ of predetermined characters 

0: when the input character is the numeral "1", "2" or "3," 30 before character recognition is performed is impractical in a 

1: when the input character is the numeral "0"," 6" or "9," character-recognition system designed to recognize the 

and handwriting of many different writers. 
2: when the input character is the numeral "8." Although a character recognition system for handwriting 
Typically many hundreds of feature values are extracted must tolerate the variations in characters that result from the 
for each input character in the input character pattern. The 35 system being used by different writers, these variations are 
feature values are represented by a feature vector whose also a primary factor that hinders improving the recognition 
elements each represent the feature value of one of the ratio of such systems. For example, if the characters in one 
features of the input character. A feature vector has a large category written by one writer resemble the characters in 
number of dimensions, with 500 dimensions being typical. another category written by another writer, accurate char- 
In the discrimination stage, the feature vector of each 40 acter recognition of the handwriting of both writers will be 
input character in the input character pattern is compared extremely difficult if the same recognition dictionary is used, 
with a reference vector for each category. The input char- To solve this problem, as noted above, conventional prior- art 
acter is determined to belong to the category whose refer- systems store a personal recognition dictionary for each 
ence vector is closest to the feature vector of the input writer whose handwriting will be recognized by the system, 
character. In character recognition, each "category" repre- 45 The personal recognition dictionary is created by requiring 
sents one character. For example, in numeral recognition, a the writer to hand write a predetermined set of characters 
category exists for each of the characters "0," "1," . . . , "9/' before the system performs character recognition on the 

The effectiveness of a character recognition system is writer's handwriting, 

characterized by its "recognition ratio." When character The document Improving Handwritten Character Recog- 

recognition is performed, one of the following results is 50 nition Using Personal Writing Characteristics, Transac- 

obtained for each input character in the input character tions of the Institute of Electronics, Information and 

pattern: (1) the category to which the input character belongs Communication Engineers, Vol. J78-D-II, No, 7, July 

is correctly recognized; (2) the input character is success- 1995, discloses methods for improving character recognition 

fully recognized as belonging to a category, but the category of handwritten characters when it is not feasible for the 

is incorrect; or (3) the input character is not recognized as 55 person using the system to hand write a predetermined set of 

belonging to any category. For example, when the input characters before the system performs character recognition 

character is the numeral "1," result (1) occurs when the input on the writer's handwriting. See also T. Kawatani, Character 

character is recognized as belonging to the category "1;" Recognition Performance Improvement Using Personal 

result (2) occurs when the input character is erroneously Handwriting Characteristics, IEEE 0-8186-7128-9/95 

recognized as belonging to the category "7," for example, 60 (1995); and T. Kawatani, N. Miyamoto, Verification of 

and result (3) occurs when the category to which the input Personal Handwriting Characteristicsfor Numerals and its 

character belongs cannot be recognized. The recognition Application to Recognition, 14 Pattern Recognition 

ratio is the number of character recognition events that Letters, pp. 335-343 (1993). These papers describe system 

generate result (1) divided by the total number of input in which the number of input characters that are erroneously 

characters in the input character pattern. A successful char- 65 recognized (result (2) above) is reduced, but the techniques 

acter recognition system is one that has a recognition ratio described do not necessarily provide in an improvement of 

close to unity (or 100%). the recognition ratio (result (1) above). 
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The apparatus and method according to the invention use a reference vector for each category re^v^-froin to 
to advantage two facts about personal handwriting charac- ^universal rc^tiition djci^^^sz^mmW^^^^mJ 
teristics described by T. Kawatani and N. Miyamoto in tfon^ctionary^ia y be _ a_readrODly^memory, _part _ of in 
Verification of Personal Handwriting Characteristics for random-access memory, part of a mass storage device, such 
Numerals and its Application to Recognition, 14 PATTERN 5 as a hard disk, or some other suitable storage device. The 
RECOGNITION LETTERS, pp. 335-343 (1993): reference vectors stored in the universal recognition dictio- 

The features of input characters belonging to the same nary are derived from the feature vectors of sets of prede- 
category written by the same person are similar; and termined characters handwritten by a number of unspecified 
There is harmony among the features of input characters writers. 

written by the same person even in different categories, 10 The preliminary character recognition block 11 pre- 
ie there is a valid or high correlation among the processes each input character and extracts a feature vector 
features of input characters in different categories. for the input character. The preliminary character recogni- 

Similar observations apply to printed fonts. tion block then compares the feature vector of the input 

The inventors have recognized that these characteristics character with the reference vector for each category to 
of handwriting enable the feature vectors of those of a 15 determine a candidate category for the input character. The 
writer's input characters that are reliably recognized as candidate category for each input character may be 
belonging to a category to be used to predict, for this writer, determined, for example, by jetermin ing a value quantif ying> 
a writer-specific feature vector for a number of other cat- me/simaarity-ordistance jgtw^„theJeature-vector:oft^ 
egories Since these writer-specific feature vectors are pre- input character,and the.refercncy_vector_of eacficategory. A 
dieted from the feature vectors extracted from the writer's 20 difference may be determined by calculating the Euclidian 
own input characters, the writer-specific feature vectors are distance between the feature vector of the input character 
personal to the writer. An improved recognition ratio can and the reference vector of each category. Alternatively, a 
then be achieved using the writer-specific feature vectors to similarity S having the range 0<S<1 may be calculated using 
generate writer-specific reference vectors. A final character the equation: 
recognition operation is then performed using the writer- 25 
specific reference vectors, mru^m inh 

In the embodiments of the apparatus and method to be s=ra*Wl>W 
described below, the writer-specific feature vectors are vec- p fe ^ q[ ^ mpu( character> 

tors predicted usine the feature vectors extracted from a first j 
Sed number of input characters in the input character 30 R ^ the reference vector of the category, and 
pattern. The writer-specific reference vectors are generated (F)(R) is the inner product between the feature vector and 

by using the writer-specific feature vectors to correct the non the reference vector. 

writer-specific reference vectors stored in the universal The resulting distance or similarity values between the 
recognition dictionary so that the final character recognition feature vector of the input character and the reference vector 
processing is performed using reference vectors that are 35 of each category are then compared with one another. The 
adapted to the writer's writing style. category having the greatest similarity value or the smallest 

FIG 1 shows a functional block diagram of one embodi- difference value is determined to be the candidate category, 
ment of the character recognition apparatus according to the For simplicity, the following description will only refer to 
invention The apparatus is preferably implemented by suit- the difference value, and will no longer refer to a similarity 
ably programming a computer or digital signal processor. 40 value as an alternative. The term "difference value" is to be 
Alternatively the apparatus may be implemented by con- understood to encompass the term "similarity value" as an 
structing the functional blocks shown in FIG, 1 from suitable alternative. Also, the term "smallest difference value" is to 
small- or large-scale integrated circuits or from discrete be understood to encompass the term "largest similarity 
components value" as an alternative. Finally, the term "distance value 

The character recognition apparatus shown in FIG. 1 45 smaller than" is to be understood to encompass the term 
performs character recognition on input character patterns "similarity value greater than" as an alternative 
supplied by multiple writers without each writer having to The preliminary character recognition block ^generates 
provide any input of a predetermined character set. The aTreco^ition result and cp^^ 

writer on whose handwriting the apparatus performs char- ogmtion^at for each input character to-the- character 
acter recognition will be called the "current writer." The 50 information stor?13. TheTecognjtion result-is-com posed^f 
following description assumes that the character recognition toxan&date category of Be.mputxharacie^npttfie^s^ 
apparatus has not previously performed character recogni- <tan<* value quantifying the oistance;:between-me-feam^ 
tion on any input character pattern written by the current vecjorof ;the. input characterand ^ t^ference^ectm^he 
vnteT candidate category. The character information store 13 

Jnie-cfiaracterinpm^^ 55 storSthe feature vector and recognition result for each input 

^ter^wffite^i^the current-writer. The current writer may character in the input character pattern input by the current 
supply the input^haracter "pattern iOeS^ime usm S a ^ character information store may be part of a 

suitable handwriting input device. Alternatively, the input random-access memory, part of a mass storage device, such 
character pattern may be scanned in from a paper or some as a hard disk, or some other suitable storage device. The 
other medium Since there is no need for the writer to input 60 character information store and the universal recognition 
a set of predetermined characters to the apparatus, the writer dictionary may be parts of the same physical device, 
may be elsewhere, or even deceased, when the writer's input Full operation of the recognized character selection block 
character pattern is input to the apparatuST - — _ _ 14, the writer-specific feature vector preoicUon block 15, the 

r^tertite^minii^charactcr recognition block 11 receife^ writer-specific reference vector generation block 16, the 
ruWnfut^^ 10 65 final character "^g^ 011 block 17 the category output 

Aar^perfo^ a preliminary character recognition opeTation block 18 is preferably delayed until the preliminary charac- 
/on eachinput character^ tfre hplif ^ tcr recognition block 11 has generated a recognition result 
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process is performed on the input characters of the input 
character pattern using a universal recognition dictionary 
similar to the universal recognition dictionary 12 shown in 
FIG. 1. The universal recognition dictionary includes a 
reference vector for each category. The reference vectors are 5 
determined by a large number of unspecified writers each 
hand writing a predetermined set of characters, as described 
above. In step 23, the feature vector and a recognition result 
is stored for each input character. The recognition result may 
include the candidate category determined by the prelimi- 10 
nary character recognition process, and the distance value 
between feature vector of the input character and the refer- 
ence vector of the candidate category. 



feature vectors for the two categories constituting the cat- 
egory pair are then used to determine a variance -covari an ce 
matrix of each category pair. The variance-covariance 
matrix represents the scatter of each feature value in the 
category pair, and the correlation between the different 
feature values. Conditional step 35 checks whether a regres- 
sion coefficient has been determined for all of the category 
pairs. If this condition is not met, execution returns to step 
34 where the regression analysis is repeated for another 
category pair. If the condition is met, execution stops. 

Although the character recognition apparatus and method 
are described in connection with their use to perform char- 
acter recognition on handwritten characters, the invention 
may also be applied to any of a wide variety of printed texts 
in different fonts, as desired. The invention may also be 



Step 24 determines whether the condition for advancing 

to the next step has been met. The condition is whether the 15 implemented using a wide variety of manual platforms, 

preliminary character recognition process has been per- automated computer platforms, nodes, or networks, or any 

formed on a specific number of input characters of the input combination thereof, as desired. 

character pattern. Alternatively, the condition may be Although the present invention has been described in 

whether a specific condition, such as recognizing characters detail with reference to a particular preferred embodiment, 

with a high reliability, has been met. If the condition is met 20 persons of ordinary skill in the art to which this invention 



in step 24, execution advances to step 25. Otherwise, execu 
tion returns to step 22. 

In step 25, the recognition results stored in step 23 are 
analyzed to select the input characters that have been 
recognized with high reliability. The feature vector and 25 
recognition result for each selected character is passed to 
step 26. The feature vector and recognition result for the 
remaining characters are passed to step 28. 

In step 26, the feature vector of each selected input 
character selected in step 25 is subjected to the regression 30 
equation described above to predict a writer-specific feature 
vector for each of multiple categories other than the candi- 
date category allocated in the preliminary character recog- 
nition process performed in step 22. In step 27, a user- 
specific reference vector is generated for each category by 35 
correcting the reference vectors using the writer-specific 
feature vectors predicted in step 26. The correction is 
preferably performed using the reference correction equa- 
tion described aboye^^^ „ - 

In -step~28fa final character recognition process is^per- ,40 
formed on the input characters not selected in step 25.Tn stepi 
28, lfie~ final -character recognition^ process is performed 
\uang-.the.«usj&jr^p 
( categojr^^step^^ 

:ljheJnpiU^^ 29. 45 

FIG. 3 is a flow chart showing the method for determining 
the regression coefficients of the regression equation used to 
predict a writer-specific feature vector for each of multiple 
alternative categories from the feature vector of an input 
character. The regression equation is applied at step 26 in the 50 
method shown in FIG. 2, for example. The method begins by 
setting the required initial values in step 30. Standard test 
patterns hand written by many people, are then input in step 
31. Preprocessing, such as normalizing the position, tilt, and 
size, is then performed on each test pattern in step 32. The 55 
feature vectors are then extracted from the input characters 
in each preprocessed test pattern in step 33. Since the 
category of each input character in the test patterns is already 
known, the feature vector extracted for a given input char- 
acter can be said to be the feature vector for the category to 60 
which the input character belongs. 

In step 34, a regression analysis between pairs of catego- 
ries is performed using the feature vector extracted in step 33 



pertains will appreciate that various modifications and 
enhancements may be made without departing from the 
scope of the claims that follow. 
We claim: 

1. A method for recognizing characters, the method com- 
prising steps of: 

providing a set of reference vectors, each of the reference 
vectors representing a category, the reference vectors 
being derived from writing samples provided by non- 
specified writers; 

performing a preliminary character recognition operation 
on plural input characters to make a preliminary deter- 
mination of the category to which each of the input 
characters belongs by comparing a feature vector of the 
one of the input characters to the reference vectors to 
determine a candidate category and to generate a rec- 
ognition result; 

with reference to the recognition results for the plural 
input characters, selecting each of the input characters 
that has been recognized with high reliability as a 
selected input character, 

predicting, from the feature vector of each selected input 
character, a writer-specific feature vector for at least 
one category other than the candidate category of the 
selected input character, 

generating a writer-specific reference vector for each 
category using the writer-specific feature vector pre- 
dicted for the category; and 

performing a final character recognition operation using 
the writer-specific reference vectors to make a final 
determination of the category to which each of the 
input characters belongs. 

2. The method of claim 1, in which: 

the recognition result for each input character includes a 
distance value between the feature vector of the input 
character and the reference vector of the candidate 
category; and 

the step of selecting each input character that has been 
recognized with high reliability includes a step of 
selecting an input character having a low distance value 
as the selected input character. 

3. The method of claim 1, in which, in the step of 
generating a writer-specific reference vector, the writer- 
specific reference vector for each category is generated by 



for each category in the pair of categories. In the regression 

analysis, the feature vectors extracted for each category from 65 adopting the writer-specific feature vector predicted for the 

all the preprocessed test patterns are averaged to determine category as the writer-specific reference vector for the 

an average feature vector for the category. The average category. 
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writer-specific feature vector generating means for gen- 
erating a writer-specific feature vector for each cat- 
egory from the feature vectors of the selected input 
characters, the writer specific feature vector gener- 
ating means generating plural writer-specific feature 5 
vectors for at least one category, and including 
means for averaging the plural writer-specific feature 



14 

vectors to obtain a writer-specific feature vector for 
the at least one category, and 
means for deriving the writer-specific reference vector 
for each category from the writer-specific feature 
vector for the category. 

***** 
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